RALI: Automatic Weighting of Text Window Distances

نویسندگان

  • Bernard Brosseau-Villeneuve
  • Noriko Kando
  • Jian-Yun Nie
چکیده

Systems using text windows to model word contexts have mostly been using fixed-sized windows and uniform weights. The window size is often selected by trial and error to maximize task results. We propose a non-supervised method for selecting weights for each window distance, effectively removing the need to limit window sizes, by maximizing the mutual generation of two sets of samples of the same word. Experiments on Semeval Word Sense Disambiguation tasks showed considerable improvements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Widening the HolSum Search Scope

We investigate different areas of the highdimensional vector space built by the automatic text summarizer HolSum, which evaluates sets of summary candidates using their similarity to the original text. Previously, the search for a good summary was constrained to a very limited area of the summary space. Since an exhaustive search is not reasonable we have sampled new parts of the space using ra...

متن کامل

The BAF: A Corpus of English-French Bitext

The BAF is a corpus of English and French translations, hand-aligned at the sentence level, which was developed by the University of Montreal's RALI laboratory, within the "Action de recherche concertée" (ARC) A2, a cooperative research project initiated and financed by the AUPELF-UREF. The corpus, which totals approximately 800 000 words, is primarily intended as an evaluation tool in the deve...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

مدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی

Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing.   This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to c...

متن کامل

From Text to Images: Weighting Schemes for Image Retrieval

Bags of visual words are the most studied image description technique in the last years. This representation of images raised new possibilities as well as new research issues. In particular, it is important to automatically determine which visual words are the most relevant to describe the images, and which ones should be ignored. This issue is a classical problem of textual information retriev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010